Article 2322
Title of the article |
Using machine learning for recognition of text patterns of literary sources |
Authors |
Valeriya S. Tomashevskaya, Candidate of engineering sciences, associate professor of the sub-department of corporate information systems, Institute of Information Technologies, MIREA – Russian Technological University (78 Vernadskogo avenue, Moscow, Russia), tomashevskaya@mirea.ru |
Abstract |
Background. Today, in the field of artificial intelligence, there are natural language processing technologies, the purpose of which is to solve problems in such areas as machine translation, text sentiment analysis and text classification. In the article, within the framework of the problem of recognition of text patterns, the application of machine learning and data mining methods is considered. The object of the study is the types of literary sources. The subject of the research is the classification of literary sources using machine learning methods. The purpose of the work is to compare the effectiveness of machine learning methods in solving the problem of binary classification of literary sources and to identify the distinctive features inherent in each of them. Materials and methods. Classification of literary sources using the Naive Bayes classifier and Logistic regression, and the Bag of Words and TF-IDF methods. Results. A comparative analysis of the obtained models was carried out. The model with which the Logistic regression and the Bag of Words method were used together demonstrates the greatest efficiency. Conclusions. Logistic regression and the Bag of Words method demonstrated the greatest efficiency when working with text templates, while the use of stemmization and lemmatization did not affect the final model efficiency indicator. The second type of literary sources contains text constructions unique to it, such as “[Electronic resource]” or “date of access”, which increase the chance of correct classification. |
Key words |
natural language processing, machine learning, naive bayes classifier, logistic regression |
![]() |
Download PDF |
For citation: |
Tomashevskaya V.S., Starichkova Yu.V., Yakovlev D.A. Using machine learning for recognition of text patterns of literary sources. Izvestiya vysshikh uchebnykh zavedeniy. Povolzhskiy region. Tekhnicheskie nauki = University proceedings. Volga region. Engineering sciences. 2022;(3):15–26. (In Russ.). doi:10.21685/2072-3059-2022-3-2 |
Дата обновления: 20.12.2022 12:34